424Rec'dPCT/PTO 2 6 JUL 20Q0 



FORM PTO- 1 390 (Modified) U S. DEPARTM ENT OF COMMERCE PATENT AND TRADEMARK OFFICE 
(REV t 1-9S) 

TRANSMITTAL LETTER TO THE UNITED STATES 
DFSTGNATFD/ELECTED OFFICE CDO/EO/US') 
CONCERNING A FILING UNDER 35 U.S.C. 371 


ATTORNEY'S DOCKET NUMBFR 

27428/36596 


U S. ASEUCATION NO Op ig^OWN^EE 37 CFR 

09/60116/ 


INTERNATIONAL APPLICATION NO INTERNATIONAL FILING DATE 
PCT/DE99/00213 28 July 1999 


PRIORITY DATE CLAIMED 

31 January 1998 


TITLE OF INVENTION 

COMPRESSION OF DATA WITH SYNTACTIC STRUCTURE 


APPLICANT(S) FOR DO/EO/US 

PETER ECK„ ROLF MATZNER and CHANGSONG XIE 



'Applicant herewith submits to the United States Designated/Elected Office (DO/EO/US) the following items and other mformation. 
This is a FIRST submission of items conceming a fihng under 35 U.S.C. 37 L 

This is a SECOND or SUBSEQUENT submission of items conceming a filing under 35 U.S.C. 371. 



1 5. 



2. □ 
3- □ 



6. m 

7. □ 



9. □ 

10. □ 

11. □ 

12. □ 

Items 



13. 


□ 


14. 


□ 


15. 




16. 


□ 


17. 




18. 


□ 


19. 




20. 


□ 



This is an express request to begin national examination procedures (35 U.S.C. 371 (f)) at any time rather than delay 
examination until the expiration of the applicable time limit set in 35 U.S.C. 371(b) and PCT Articles 22 and 39(1). 

A proper Demand for International Preliminary Examination was made by the 19th month from the earliest claimed priority date. 

A copy of the International Application as filed (35 U.S.C. 371 (c) (2)) 

a. n is transmitted herewith (required only if not transmitted by the International Bureau). 

b. ^ has been transmitted by the International Bureau. 

c. n is not required, as the application was filed in the United States Receiving Office (RO/US). 
A translation of the International Application into English (35 U.S.C. 371 (c)(2)). 

A copy of the International Search Report (PCT/ISA/210). 

Amendments to the claims of the International Application under PCT Article 19 (35 U.S.C. 371 (c)(3)) 

a. □ are transmitted herewith (required only if not transmitted by the International Bureau). 

b. □ have been transmitted by the International Bureau. 

c. □ have not been made; however, the time limit for making such amendments has NOT expired. 

d. have not been made and will not be made. 

A translation of the amendments to the claims under PCT Article 19 (35 U.S.C, 371(c)(3)) 

An oath or declaration of the inventor(s) (35 U.S.C. 371 (c)(4)). 

A copy of the International Preliminary Examination Report (PCT/IPEA/409). 

A translation of the annexes to the International Preliminary Examination Report under PCT Article 36 
(35 U.S.C. 371 (c)(5)). 

13 to 20 below concern document(s) or information included: 

An Information Disclosure Statement under 37 CFR 1 .97 and 1.98. 

An assignment document for recording. A separate cover sheet in compliance with 37 CFR 3.28 and 3.31 is included. 
A FIRST preliminary amendment. 

A SECOND or SUBSEQUENT preliminary amendment. 
A substitute specification. 

A change of power of attorney and/or address letter. 
Certificate of Mailing by Express Mail 
Other items or information: 



Page 1 of 2 



PCTUS1/REV03 



534 Rec'd PCT/PTQ 2 8 JUIZQQO 



U.S. APPLICATION NO (IF KNOWN, StE 37l'FR' 



09/6011 



INTERNATIONAL APPLICATION NO 

PCT/DE99/00213 



ATTORNEY'S DOCKET NUMBER 
27428/36596 



21 . The following fees are submitted . 

BASIC NATIONAL FEE ( 37 CFR T492 (a) (1) - (5)) : 

□ Neither international preliminary exammation fee (37 CFR L482) nor 
international search fee (37 CFR 1 445(a)(2) paid to USPTO 

and International Search Report not prepared by the EPO or JPG 

^ International preliminary examination fee (37 CFR 1 .482) not paid to 

USPTO but Internation Search Report prepared by the EPO or JPO 

□ International preliminary examination fee (37 CFR 1 .482) not paid to USPTO 
but international search fee (37 CFR 1 445(a)(2)) paid to USPTO 

□ International preliminary examination fee paid to USPTO (37 CFR 1 .482) 
but all claims did not satisfy provisions of PCT Article 33(1 )-(4) 



Surcharge of $130.00 for furnishing the oath or declaration later than 
months from the earliest claimed priority date (37 CFR 1 .492 (e)). 



$970.00 



$840.00 



$690.00 



$670.00 



□ 



International preliminary examination fee paid to USPTO (37 CFR 1.482) 
and all claims satisfied provisions of PCT Article 33(1 )-(4) 



$96.00 



ENTER APPROPRIATE BASIC FEE AMOUNT = 



□ 20 



□ 30 



CLAIMS 



NUMBER FILED 



NUMBER EXTRA 



RATE 



CALCULATIONS PTO USE ONLY 



$840.00 



$0.00 



Total claims 



7 



-20 = 



0 



$18.00 



$0.00 



Independent claims 



- 3^ 



$78.00 



so.oo 



^Multiple Dependent Claims (check if applicable). 



□ 



so.oo 



TOTAL OF ABOVE CALCULATIONS = 



$840.00 



Reduction of 1/2 for filing by small entity, if applicable. Verified Small Entity Statement 
hiust also be filed (Note 37 CFR 1 .9, 1 .27, t 28) (check if applicable). 



□ 



$0.00 



SUBTOTAL = 



$840.00 



Processing fee of $130.00 for furnishing the English translation later than 
months from the earliest claimed priority date (37 CFR 1 .492 (f)). 



□ 20 



□ 30 



$0.00 



TOTAL NATIONAL FEE = 



$840.00 



Fee for recording the enclosed assignment (37 CFR 1 .21(h)). The assignment must be 
accompanied by an appropriate cover sheet (37 CFR 3.28, 3.3 1) (check if applicable). 



□ 



$0.00 



TOTAL FEES ENCLOSED = 



$840.00 



Amount to be: 
refunded 



charged 



la A check in the amount of $840.00 

Q Please charge my Deposit Account No. 

A duplicate copy of this sheet is enclosed. 



to cover the above fees is enclosed. 



m the amount of 



to cover the above fees. 



3 The Commissioner is hereby authorized to charge any fees which may be required, or credit any overpayment 
to Deposit Account No. 13-2855 A duplicate copy of this sheet is enclosed. 

NOTE: Where an appropriate time limit under 37 CFR 1.494 or 1.495 has not been met, a petition to revive (37 CFR 
1.137(a) or (b)) must be filed and granted to restore the application to pending status. 

SEND ALL CORRESPONDENCE TO: 



Nate F. Scarpelli 

MARSHALL, O'TOOLE, GERSTEIN, 

MURRAY & BORUN 
233 South Wacker Drive 
6300 Sears Tower 
Chicago, Illinois 60606-6402 
Tel.: (312)474-6300 
Fax: (312)474-0448 




SIGNATURE 
Nate F. Scarpelli 



NAME 
22,320 



REGISTRATION NUMBER 

July 28, 2000 

DATE 



Page 2 of 2 



ffo. 2979 P, 3/3 
Froin:0323 



PATENT 

Attorne/s Docket No: 27428/36596 

Applicant or Patentee: PETER ECK, ET AL. 

Serial or Patent No: 09/601 ,167 

Filed or Issued: July 28, 2000 

For: COMPRESSION OF DATA WITH SYNTACTIC STRUCTURE 

VERIFIED STATEMENT (DECLARATION) CLAIMING SMALL ENTITY 
STATUS (37 CFR 1.9(f) and 127(c)) SMALL BUSINESS CONCERN 

! hereby declare that ( am 

□ The owner of the small business concern identified below: 

J& An official of the small business concern empowered to act on behalf of 
the concern identified below: 

NAME OF CONCERN Syntion AG 

ADDRESS OF BUSINESS Leonrodplatz 2, 80636 Munich, Federal 
Republic of Germany 

I hereby declare that the above-identified small business concern qualifies as a small 
business concern as defined in 13 CFR 121.12, and reproduced in 37 CFR 1.9(d), for 
purposes of paying reduced fees under Section 41 (a) and (b) of Title 35, United States 
Code, in that the number of employees of the concern, including those of its affiliates, 
does not exceed 500 persons. For purposes of this statement, (1) the number of 
employees of the business concern is the average over the previous fiscal year of the 
concern of the persons employed on a fuit-time, part-time or temporary basis during 
each of the pay periods of the fiscal year, and (2) concerns are affiliates of each other 
when either, directly or indirectly, one concern controls or has the power to control the 
other, era third-party or parties controls or has the power to control both. 

I hereby declare that rights under contract or law have been conveyed to, and remain 
with, the small business concern identified above with regard to the invention, entitled 
COMPRESSION OF DATA WITH SYNTACTIC STRUCTURE, by inventor(s) Peter Eck, 
Rolf Matzner. and Changsong Xie, 

described in 

□ The specification filed herewith. 

S Application Serial No. Q9/601.167 filed July 28, 2000 . 

□ Patent No. . issued 




Ant 21. 2000 4;14fM MARSHALL, O TOOLE 

V 



Frofn;0823 



If the rights held by the above-identified small business concern are not exclusive, each 
individual, concern or organization having rights to the invention is listed below* and no 
rights to the invention are held by any person, other than the inventor, who would not 
qualify as an independent inventor under 37 CFR 1.9(c), if that person made the 
invention, or by any concern which would not qualify as a small business concern under 



37 CFR 1.9(d) or a nonprofit organization under 37 CFR 1-9(e). 


*NOTE: 


Separate verified statements are required from each named person, concern or 




organization having nghts to the invention averring to their status as small entities. 




(37 CFR 1.27). 


NAME: 




ADDRESS: 


□ INDIVIDUAL 


□ SMALL BUSINESS CONCERN □ NONPROFIT ORGAN I2ATJ0N 


NAME: 




ADDRESS: 



□ INDIVIDUAL □ SMALL BUSINESS CONCERN □ NONPROFIT ORGANIZATION 



I acknowledge the duty to file, in this application or patent, notification of any change 
in status resulting in loss of entitlement to small entity status prior to paying, or at the 
time of paying, the earliest of the issue fee or any maintenance fee due after the date 
on which status as a small entity is no longer appropriate. (37 CFR 1.28(b)). 

I hereby declare that ail statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be tnje; and further 
that these statements were made with the knowledge that willful false statements and 
the like so made are punishable by fine or imprisonment or both, under Section 1001 
of Title 18 of the United States Code, and that such willful false statements may 
jeopardize the validity of the application, any patent issuing thereon, or any patent to 
which this verified statement is directed, 

NAME OF PERSON SIGNING: RocF MATZhJeR, 

TITLE OF PERSON OTHER THAN OWNER: ^^'^'^ T^CHsioiocy oFFf^e-R 

Date 2^' Ai^^^ 7oov 



ADDRESS Of PERSON SIGNING: 



SIGNATURE: 




-2- 



09/f^0ll67 

534Rer/dPCT/FTv 28JUUOO0 

PATENT 

27428/36596 

IN THE UNITED STATES 
PATENT AND TRADEMARK OFFICE 



APPLICANT: 

PETER ECK, ET AL . 

SERIAL NO. : To Be Determined 

FILED: Filed Herewith 

FOR: COMPRESSION OF DATA 
WITH SYNTACTIC STRUCTURE 

Based on: PCT/DE99/00213 
Filed: 31 July 1997 
and based on 19803845.3 
filed 31 January 1998 

GROUP ART UNIT: To Be 

Determined 

EXAMINER: To Be Determined 



"EXPRESS MAIL" mailing label 
No. EL 566462735 US. 
Date of Deposit: 
July 28, 2 000 
I hereby certify that this 
paper (or fee) is being 
deposited with the United 
States Postal Service 
"EXPRESS MAIL POST OFFICE TO 
ADDRESSEE" service under 3 7 
CFR §1.10 on the date 
indicated above and is 
addressed to: Assistant 
Commissioner for Patents, 
Washington, D.C. 20231 



Laura Frasher 



PRELIMINARY AMENDMENT "A" ACCOMPANYING 
SECTION 371 NATIONAL STAGE APPLICATION AS FILED 

As s i s t an t Commi s s i one r 

for Patents 
BOX PCT 

Washington, D.C. 20231 
Sir: 

Please amend the accompanying Section 371 national 
stage application as filed as follows: 



IN THE SPECIFICATION : 

In place of the enclosed literal translation of the 
specification of the international application, marked 
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"APPLICATION WITH LITERAL TRANSLATION ^^ , please substitute the 
enclosed specification marked " SUBSTITUTE SPECIFICATION ^^ . 

The substitute specification does not include any 
new matter. Amendments made to the specification text are as 
follows : 

On page 3, line 20 "4.1 Summary of the Invention" 

inserted . 

On page 3, lines 22-32 and page 4; lines 1-3 a 
statement equivalent to Claim 1 has been inserted. 

On page 4, lines 5-27 a statement equivalent to 
Claim 8 has been inserted. 

On page 5, lines 2-22 a statement equivalent to 
Claim 9 has been inserted. 

On page 10, line 11 "6.3 Brief Description of the 
Drawings" inserted . 

On page 11, line 18 "6.3" change to "6.4". 

On page 12, line 21 "6.4" changed to "6.5". 

On page 13, line 11 "6,5" changed to "6.6". 

On page 13, line 20 "Claim 22" corrected to "these 

claims" . 

IN THE CLAIMS : 

Please cancel claims 1-10 without prejudice. 

Please add the following claims 11-17: 
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--11. A method of transmitting a text represented 
by digital data; the structure of which is defined by a 
grammar with grammatical rules, from a transmitter to a 
receiver comprising the steps of: 

(a) in the transmitter, for the purpose of 
encoding, the grammatical rules contained in the text are 
converted to a sequence of syntax-directed coding (SDC) 
symbols by parsing the text and generating a parse tree, such 
that to each node in this parse tree there is attributed an 
SDC symbol, which out of all the rules permitted by the 
grammar at this site unambiguously identifies the rule that is 
actually contained in the text, and such that the SDC symbols 
are concatenated, according to a fixed order of traversing all 
nodes of the parse tree, to form a linear sequence of SDC 
symbols ; 

(b) sending the SDC symbol sequence from a 
transmitter to a receiver for storage in the receiver; 

(c) in the receiver, for the purpose of decoding, 
in a stack machine the grammatical rule corresponding to each 
stored SDC symbol in the sequence of SDC symbols is executed 
in order to generate output data that contain the text, in 
which process an uppermost entry in a stack memory of the 
stack machine is replaced according to a grammar production 
determined by an SDC symbol that has been input, and parts of 
the grammatical rules that cannot yet be completely processed 
are deposited in a stack memory, whereas 
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parts of the production for which substitution is 
complete are processed immediately to form part of an 
executable program for a real processor or a virtual 
machine . - - 

--12. A method as claimed in Claim 11, including 
the steps of: 

(a) initializing the stack machine by depositing a 
specific start symbol or non-terminal symbol into an empty 
stack memory; 

(b) reading an uppermost symbol from the stack 

memory; 

(c) testing whether the read symbol is a terminal 
or a non-terminal symbol; 

(dx) if the read symbol is a terminal symbol, 
outputting the symbol regardless of whether additional symbols 
are present in the stack memory, and continuing the method 
with step (b) above and terminating the method whenever the 
stack memory is empty, or 

(d2) if the read symbol is a non- terminal symbol, 
reading the next SDC symbol from the input stream; 

(e) dependent on what SDC symbol has been read, 
selecting only one alternative or chain of terminal and/or 
non-terminal symbols out of the set of alternatively 
applicable replacement rules or productions that are valid for 
the non-terminal symbol currently being processed; 
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(f) placing the chain of terminal and/or non- 
terminal symbols into the stack memory and then continuing the 
method with step (b) above .- - 

--13. A method as claimed in Claim 11, wherein 

(a) to each node in the parse tree there is 
attributed an SDC symbol and the probability distribution of 
all the SDC symbols possible in this node, 

(b) the sequence of SDC symbols is subjected to 
entropy encoding in conformity with the associated probability 
distributions , and 

(c) the entropy decoding is carried out with a 
probability distribution of SDC symbols identical to that used 
for the entropy encoding. - - 

--14. A method as claimed in Claim 13, wherein 

(a) the probability distribution of the rules that 
can be applied in a node, starting from an initial 
distribution, is adapted at each appearance of an SDC symbol 
in such a way that the probability of the SDC symbol that 
appears is increased and the probability of all other symbols 
is correspondingly reduced, 

(b) the currently valid distribution of occurrence 
probabilities is assigned to the SDC symbols of the associated 
node type , 
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(c) the probability distribution of all SDC symbols 
in the current node, together with the SDC symbol to be 
encoded, forms a model for an arithmetic encoding, 

(d) during decoding an end of the text is 
recognized by the fact that the stack memory is empty, and 

(e) an End-Of -Message (EOM) symbol required for 
arithmetic coding is eliminated - 

--15. Apparatus for transmitting a text represented 
by digital data, the structure of which is defined by a 
grammar with grammatical rules, from a transmitter to a 
receiver, comprising 

(a) an encoder, which comprises 

(aa) a scanner to transform text consisting of 
a sequence of readable characters into a sequence of terminal 
symbols , 

(ab) a parser to find grammatical rules, the 
successive application of which was originally used to 
generate the sequence of terminal symbols, 

(ac) a mapper, which unambiguously associates 
syntax-directed symbols with the rules identified by the 
parser and outputs these symbols in a fixed sequence, and 

(b) a decoder, which comprises 

(ba) a stack machine that, according to an 
uppermost symbol in the stack memory and any adjacent SDC 
symbol, outputs the already fixed terminal symbol, or deposits 
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the sequence of terminal and/or non-terminal symbols 
associated with the current symbol into the stack memory, and 

(bb) a lexicon that replaces the terminal 
symbols by chains of readable alphanumeric characters, such 
that the decoder immediately processes parts of the production 
for which substitution is complete to form part of an 
executable program for a real processor or a virtual 
machine . - - 

--16. Apparatus as claimed in Claim 15, in which 
the text transformed by the scanner into a sequence of 
terminal symbols is a program present in the source text or in 
a form derived from the source text by a preprocessor, and in 
which a code generator is provided that generates from the 
sequence of terminal symbols an executable machine code or an 
intermediate code to be executed on a virtual machine, - - 

--17. Apparatus as claimed in Claim 15, which 

comprises 

(a) in the transmitter 

(aa) a table that contains the probability 
distributions of the SDC symbols for each node type in a parse 
tree generated by a parser and the contents of which are 
established at initialization with fixed initial probability 
distributions for each node type. 



- 7 - 



(ab) an adapter that updates the probability 
distribution of the SDC symbols for a node type valid at the 
current moment with reference to the existing probability 
distribution, the SDC symbol to be encoded and the current 
node type, and enters this new probability distribution into 
the table, and 

(ac) an arithmetic encoder that encodes the SDC 
symbol currently to be encoded with the currently valid 
probability distribution supplied by the adapter; 

(b) in the receiver 

(ba) a table that contains the probability 
distributions of the SDC symbols for each node type, the 
contents of which are established at initialization with fixed 
initial probability distributions for each node type, 

(bb) an adapter that updates the probability 
distribution of the SDC symbols for the node type valid at the 
current moment, as established by the stack machine, with 
reference to the existing probability distribution, the SDC 
symbol to be decoded and the current node type, and enters 
this new probability distribution into the table, and 

(be) an arithmetic decoder which, with 
reference to the currently valid probability distribution of 
the current node type supplied by the adapter, decodes the 
next SDC symbol and sends it to the stack machine for further 
processing . - - 
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REMARKS 

The enclosed translation of the international 
application marked ^^APPLICATION WITH LITERAL TRANSLATION" 
includes the translated specification, claims, abstract, and 
drawings text . 

The substitute specification marked ^^SUBSTITUTE 
SPECIFICATION"' includes a revised specification and abstract 
for use in the examination of this application and does not 
include any new matter with respect to the literal 
translation . 

It is requested that the Examiner uses the 
^^SUBSTITUTE SPECIFICATION" in place of the literal translation 
during the examination of this application. 



Respectfully submitted, 

Nate F. Scarpelliy^eg . #22,320 
MARSHALL, O' TOOLE, GERSTEIN, 

MURRAY Sc BORUN 
63 00 Sears Tower 
233 S. Wacker Drive 
Chicago, Illinois 60606 
Tel . : (312) 474-6300 
Fax: (312) 474-0448 
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COMPRESSION OF DATA WITH SYNTACTIC STRUCTURE 

1 . Introduction to the General Topic 

5 The coding of source files has important technical 
applications in the area of transmission and storage of 
information. The desire to have a low data rate in the 
transmission of information, and a small space 
requirement in the storage of information, has given 
10 rise to the demand for a form of information coding that 
is as brief as possible, i.e. one with little or, 
ideally, no redundancy. 

2 . State of the Art 

15 

2.1. Forms of Redundancy 

Redundancy in streams of source symbols takes two basic 
forms: it can appear as a differential frequency of 
20 occurrence of the elements in the set of source symbols, 
and in the form of statistical dependencies between 
temporally separated source symbols, which usually are 
caused by source-specific formation laws for the 
construction of a text as a sequence of source symbols. 

25 

3 . Disadvantages of Known Methods 

For the compression of messages from sources that suffer 
from redundancy and emit source symbols that are not 
30 uniformly distributed but are statistically independent, 
a procedure according to Huffman [1] is known. Being 
specially intended for this type of source, the 
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procedure enables maximal compression only under the 
extremely favorable condition that the statistical 
properties of the source are known or can be reliably 
estimated . 

5 

Whereas estimation of the probabilities of occurrence of 
individual source symbols, which is necessary for 
Huffman coding, can be done with sufficient accuracy 
even in the case of moderately long texts, the 

10 estimation of statistical dependencies between 
temporally separate source symbols as a rule requires 
extensive observation of the source, which goes far 
beyond what is feasible both technically ( storage 
capacity) and in terms of observation time. For this 

15 reason the known or similar procedures always involve 
making assumptions about the underlying formation law or 
crucial characteristics thereof, and then of course are 
suitable only for sources with this formation law . This 
qualification also applies to the Lempel-Ziv (LZ ) 

20 procedure for data compression [2] , which has been known 
since the end of the 1970^ s. This procedure is suitable 
for the compression of arbitrary linear symbol 
sequences; that is, it can be applied with no knowledge 
of the statistical properties of the information source. 

25 The main areas of application today are the compression 
of pictorial data (GIF file format) and arbitrary files 
(archiving programs gnuzip, pkzip) . In contrast to 
Huffman coding, this procedure utilizes statistical 
dependencies between temporally successive symbols, by 

30 taking into account repeatedly occurring subsequences of 
symbols (strings) . Hence LZ compression is suitable only 
for sources, the formation law of which leads to the 
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frequent occurrence of particular strings. The 
redundancy introduced into a text by following 
grammatical rules during its construction is not 
accessible to the known compression procedures because 
5 of the recursive structure of grammatical rules^ since 
these procedures merely employ symbol frequencies or 
repetitive chains of alphanumeric characters for 
compression . 



10 4. Definition of the Object of the Invention 

The object of the present invention is to provide a 
method and apparatus for implementing the method that 
enable a text consisting of a stream of alphanumeric 
15 characters, and is constructed according to grammatical 
rules, to be encoded in such a way as to remove the 
redundancy produced as a result of the limitation of all 
possible character sequences by the grammar. 



20 4.1 Summary of the Invention 



According to a first aspect of the present invention 
there is provided a method for the compressed 
transmission and/or storage of a text represented by 

25 digital data, the structure of which is defined by a 
grammar, wherein 

prior to storage and/or transmission the text is 
converted into a linear sequence of symbols that specify 
the successive application of grammatical rules to form 

30 the text, and 

to decode the text after the storage and/or 
transmission, for every symbol thus produced by syntax- 
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directed coding (SDC) the grammatical rule corresponding 
to that particular SDC symbol is performed and 
particular output data are generated by this rule. 

5 According to a second aspect of the present invention 
there is provided an apparatus for the compressed 
transmission and/or storage of a text represented by 
digital data, the structure of which is defined by a 
grammar, comprising 
10 a scanner to transform the text comprising a 

sequence of readable characters into a sequence of 
terminal symbols, 

a parser to find the grammatical rules, the 
successive application of which was originally used to 
15 generate the sequence of terminal symbols, 

a mapper, which unambiguously associates syntax- 
directed symbols with the rules identified by the parser 
and outputs these symbols in a fixed sequence, and 
a decoder comprising 
20 (a) a stack machine which, according to the 

uppermost symbol in the stack memory and any adjacent 
SDC symbol, outputs the already fixed terminal symbol, 
or deposits the sequence of terminal and/or non-terminal 
symbols associated with the current symbol into the 
25 stack memory, and 

(b) a lexicon that replaces the terminal symbols by 
chains of readable alphanumeric characters. 
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According to a third aspect of the present invention 
there is provided an encoder comprising 

a scanner to transform a program, present in the 
5 source text or in a form derived from the source text by 
a preprocessor, into a sequence of terminal symbols, 

a parser to find the grammatical rules , the 
successive application of which was originally used to 
generate the sequence of terminal symbols, 

10 a mapper, which unambiguously associates syntax- 

directed symbols with the rules identified by th^ parser 
and outputs these symbols in a fixed sequence, 
and a decoder comprising 

a stack machine which, according to the uppermost 

15 symbol in the stack memory and any associated SDC 
symbol, outputs the already fixed terminal symbol, or 
deposits the sequence of terminal and/or non-terminal 
symbols associated with the current symbol into the 
stack memory, and 

20 a code generator that generates from the sequence 

of terminal symbols executable machine code, or 
intermediate code to be executed on a virtual machine. 

5 . Advantages of the Solution in Accordance with the 
2 5 Invention 

With the solution in accordance with the invention, 
enabling data reduction in dependence on the 
programming-language syntax employed, it now becomes 
30 possible in an especially advantageous manner to remove, 
asymptotically almost completely, redundancy produced in 
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texts during their construction because of the need to 
follow certain grammatical rules. 



The invention has proved this by analytical calculation 
5 of the capacity of its data stream, encoded in a syntax- 
directed manner, using the example of a language with 
characteristics typical of programming languages 
(mathematical expressions, if-then-else construct, 
etc.)- In this process it was shown that an appreciable 
10 amount of the redundancy that conventional methods 
(Lempel-Ziv, Huffman) are designed to address has not 
previously been eliminated by these methods. 



The new method of syntax-directed coding, in contrast, 
15 is capable of reducing the amount of data by an 
additional ca . one-half in comparison, for instance, to 
a result obtained with the Lempel-Ziv compression 
procedure . 



20 Thus the method in accordance with the invention, 
together with the apparatus specified for its 
implementation, is predestined for all applications that 
involve the transmission or storage of syntactically 
structured texts. Examples include the transmission of 

25 Java applets in the Internet or an intranet, the 
transmission or storage of Postscript files, the 
transmission and storage of iyiPEG4-encoded video data, 
and the use of higher protocols in communication. 
Especially for the transmission of programs, such as 

30 Java applets, syntax-directed coding (SDC) is 
particularly suitable because the parser is a component 
of both a compiler and an SDC coder, so that SDC can be 



- 7 - 



organically integrated into the data flow from the 
writing to the running of the program. 

6 . Mode of operation of the invention 

5 

6.1 Syntactically structured source 

The invention takes as a point of departure a formal 
language represented by a set of chains of alphanumeric 

10 characters. The text is put together according to a 
grammar assigned to the language. A grammar is both the 
mathematical system used to define a formal language^ 
and a set of rules to determine the syntactic validity 
of a specific sentence. Only context-free grammars are 

15 considered here, because higher programming languages 
(C, C++, Java, etc.) are described almost exclusively by 
context-free grammars. The following definitions apply: 

Definition 1 A context-free grammar is a 4~tuple G = 
20 (N,T,P,S) , where 

N is the set of non-terminal symbols ^ 
T is the set of terminal symbols ^ and 
P is the relation N x (N U T) ^ . 

S is a special non-terminal symbol^ also called 
25 start symbol. 

Definition 2 An element {A^ J3) e P is called production 
and is abbreviated A --^ p. 

30 A production (or derivation) is thus a replacement rule 
for a non-terminal symbol A, such that this non-terminal 
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symbol is replaced by a string (chain) P composed of 
(non-terminal and) terminal symbols. 

When there exist several productions (A, (5i) , (A^ 132) r 
5 ...f (A^ Pn) e for this we write: 

A ^ pi 

I P2 
I /3n 

10 

Definition 3 The set P^o Q P of all possible derivations 
for a non-terminal symbol Aq 

Pao = { {A,J3) G P : A = Ao) 
is also called the set of the alternatives for Aq. The 
15 decision to use a specific production {AorPk) ^ Paof ^ 
l,2f '--r ^) is called selection. 

Example 1 All arithmetic expressions in the variables a 
with the operations + and ^ as well as arbitrarily 
20 nested brackets (,) are a formal language generated by 
the grammar 

G = {{E,T,F}, {a,+,^, (, ) }rP.E) . 
P consists of the following productions : 

E ^ E-\-T 

25 I T 

I F 
F (E) 
I a 



a * a + (a* a + a) * a, for example^ is a valid sentence . 
In generating a sentence, further strings of non- 
terminal and, where appropriate, terminal symbols are 
derived by application of suitable productions to the 
non-terminal symbols until ultimately the string 
consists only of terminal symbols. In the example given 
above, the valid sentence consists only of the terminal 
symbols (^). 

6.2 Principle of Syntax-Directed Coding 

Information always flows into a text being generated 
when during its construction decisions are made. 
Decisions are to be made when a particular one of 
several possible source symbols is to be chosen or, in 
the case of a syntactically structured source, out of 
several applicable productions A A ^ - - . 

precisely one production actually is applied (selection, 
cf . Definition 3) . 

SDC encodes, instead of the string of terminal symbols, 
the selections in the order in which they were made 
during the construction of this string (the text) . 

Definition 4 2 = {ai, as, an} is called SDC 

alphabet ^ with 

n = max { I P^l } 
A 

All the elements in each set of alternative Pa are 
numbered (arbitrarily) with SDC symbols ai e 2: 
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Definition 5 Za : Pa ^ is an (arbitrary) injective 
mapping , 

The concrete definition of Za can in general be 
5 dependent on P^. 

As an illustration of the invention the following 
exemplary embodiments are described in detail and shown 
in Figures 1 to 5, as follows. 

10 

6 . 3 Brief Description of the Drawings 

Figure 1 is a block diagram of an SDC transmission 
system 

15 Figure 2 shows alternatives of the grammar G and 

selections to which SDC symbols have been attributed 

Figure 3 is a parse tree of the sentence a * a + (a * 
a + a) ^ a with a resulting linear SDC symbol sequence 
1212221211212222222 
20 Figure 4 is a flow diagram of the stack machines- 

Figure 5 is an example of an SDC transmission 
system with receiver-side source-text output 

Figure 6 is an example of an SDC transmission 
system with executable Java applet or application as 
25 output 

The functioning of a transmission system with SDC is 
illustrated in Figure 1. In principle transmission and 
storage are equivalent with respect to the way the 
30 coding works . 



- 11 - 



The source delivers a syntactically structured stream of 
symbols (source program) . An elementary component of the 
SDC encoder is a parser, the task of which is to 
retrieve from the initially linear sequence of source 
5 symbols the grammatical structure of the sentence (the 
source program) . This grammatical structure is 
represented in general by a parse tree, which in turn 
serves as input to the encoder. The encoder encodes the 
parse tree to generate a linear sequence of SDC symbols, 

10 such that at each moment the current SDC symbol is 
output and transmitted in dependence on the current 
selection at that moment. The SDC decoder first 
reconstructs the parse tree from the linear sequence of 
SDC symbols and then, by traversing the parse tree, 

15 reproduces the sequence of terminal symbols sent out 
from the source. 

6 . 4 Encoding 

20 The elementary task of the parser comprises the 
reconstruction of the selections made by the source 
during construction of the sentence. A parsing step thus 
amounts to determining the specific production (A, J3) g 
Pa that the source had selected for constructing the 

25 text at this point. As stated in Definition 5, the 
encoder assigns to the selection thus determined the 
associated SDC symbol o> = ZaUA^ fi) ) and attributes 
to the corresponding node in the parse tree. 

30 By traversing the parse tree, ultimately the linear 
sequence of SDC symbols is generated and output. In the 
examples given here, a depth-first algorithm is always 
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used for this purpose. The process of encoding will now 
be explained in greater detail with reference to an 
exemplary parse tree and the underlying grammar from 
Example 1. 

5 

Example 2 The context-free grammar of Example 1 is 
assumed. All alternatives ^ i.e. all possibilities for 
deriving each non-terminal symbol^ are shown graphically 
in Figure 2. Because the greatest set of alternatives 
10 comprises precisely two elements ^ the SDC alphabet E = 
{1, 2} suffices. To each selection of all quantities of 
alternatives an SDC symbol is attributed. For example, 
the selection of the specific production T T ^ F is 
assigned the SDC symbol 1. 

15 

A valid sentence in this grammar is thus, for example: 
a ^ a + (a ^ a + a) * a. The parse tree for this sentence 
and the generation of the SDC symbol sequence by depth- 
first traversing are shown in Figure 3. 

20 

6.5 SDC with arithmetic encoding 

The method that has just been described is efficient 
only if 

25 1. \Pa\ = const V A, and 

2. P({Ai j3i)\A) = const V A, (5^, 

where P( (Ai is the conditional probability of 

making the selection that generates y?i from the set of 
30 alternatives Pa. 



- 13 - 



This deficiency can be alleviated by arithmetic encoding 
[3] of the SDC alphabet S as stated in Claims 5 and 7. 
The probability distributions of the SDC symbols used 
for arithmetic encoding should be adapted in an adapter 
5 according to Claim 10 in which case the adaptation must 
be carried out separately for each set of alternatives. 
Instead of the linear sequence of SDC symbols, the bit 
stream generated by the arithmetic encoder is 
transmitted. 

0 

6 . 6 Decoding 



Decoding is the inverse of the encoding process. The 
decoder formats the SDC code back into the original 
15 sentence (strings of terminal symbols) . It operates 
according to the procedure described in Claims 1, 3 and 
4, 5 and is implemented by a stack machine [4] according 
to Claims 8 and 9 . 



20 A flow diagram for a method according to these claims in 
shown in Figure 4. 



To explain operation of the stack machine the following 
abbreviations are employed: 

25 

f5k String of terminal and non-terminal symbols 
(J SDC symbol 

V Terminal or non-terminal symbol 
S Start symbol (non-terminal symbol) 
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The decoding thus proceeds in the following steps (the 
step numbers corresponding to the reference numerals 
shown in Figure 4 : - 

1. The development of a valid sentence according to 
the grammar (a program) begins with the start 
symbol S. The start symbol S is placed on the 
stack . 

2. The uppermost symbol V is read from the stack. 

3. Check whether V is a non-terminal symbol. 

4. From the linear input stream of SDC symbols the 
next SDC symbol cr is read. 

5. In case V Is a non-terminal symbol, the derivation 
with V is further developed. By way of cr a 
selection {V,j3j,) = (a) is made for V from the set 
of alternatives of V Ai y^^l ... \/3n. In accordance 
with the specific selection, V is then replaced by 
the string j3]^, 

6. The string J3]c of symbols (terminal and/or non- 
terminal symbols) is placed on the stack. 

7. If y is a terminal symbol, it is output. 

8. Check whether the stack is empty. 

9. The stack machine terminates when the stack is 
empty . 

The strings of terminal symbols output successively, 
when linked together, finally yield the sentence 
originally encoded from the source. 
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7. Exemplary Embodiment: Preferred Form of Application 

The two exemplary embodiments shown in Figures 5 and 6 
differ merely in the form in which data are sent to the 
5 receiver, namely as an output of source text 
corresponding to the method according to Claim 3 or as a 
direct output of an executable Java applet or an 
executable Java application, corresponding to the method 
according to Claim 4 . 

0 

A description in greater details of these elements and 
their operation in the embodiments shown in Figures 5 
and 6 is as follows. 



15 1. Java source 

A Java source 1 corresponds to an ASCII text based 
on the Java grammar, which consists of Java program 
text plus comments, blank spaces etc. 



20 2. SDC encoder 

The SDC encoder 2 is composed of a compiler front 
end, comprising a scanner 3, a parser 4 and a 
symbol-table memory 6, together with a parse-tree 
processor 5, an adapter 8 and modules for 

25 compression and concatenation of the generated 

data . 



3. Scanner 

The scanner 3 reads the source text, removes the 
30 text sites that are unimportant for the program 

such as documentation, comments, blank spaces etc. 
and sends (terminal) symbols to the parser 4. 



Parser 

The parser 4 interprets the (terminal) symbols from 
the scanner 3, tests whether the symbol sequences 
agree with the grammatical rules for Java and sets 
up the specific parse tree for this program. 

Parse- tree processor 

During the initialization of parse- tree processing 
all nodes of the parse tree are assigned SDC 
symbols specified for each type of node. The 
subsequent traversing of all nodes of the parse 
tree causes an SDC symbol to be output at each 
crossing of a node. These SDC symbols are sent to 
both the adapter 8 and the arithmetic coder 10, In 
addition, the parse-tree processor 5 transmits the 
current node type to the adapter 8. 

Symbol-table memory 

The symbol-table memory 6 contains the identifiers 
of the Java program text extracted by the parser 4. 

Zip- compress ion device 

The content of the symbol table 6 is compressed by 
a method according to Lempel-Ziv-Welch [2] . 

Adapter (transmitter-side) 

During the process of initialization the adapter 8 
enters initial probability distributions for each 
node type into the probability-distribution table 
9. During processing of the SDC symbols received 
from the parse-tree processor 5, and with reference 



to the current node type, at each step the 
probability distribution of all SDC symbols of the 
current node type is adapted and entered into the 
probability-distribution table 9. The adapter 8 
simultaneously makes available to the arithmetic 
encoder 10 the probability distribution associated 
with the SDC symbol currently being processed. 

Table of the probability distributions of the SDC 
symbols 

For each node type, the table 9 contains a 
probability distribution of SDC symbols that is 
peculiar to that node type. 

Arithmetic encoder 

The arithmetic encoder 3 receives from the parse- 
tree processor 5 the next SDC symbol to be encoded 
and encodes it with reference to the probability 
distribution made available by the adapter 8. The 
output of the arithmetic encoder 3 is a binary data 
stream. 

Data concatenation 

The binary data stream from the arithmetic encoder 
10 and the symbol table 6 compressed by the 
compressor 7 are combined to form a data packet. 

HTTP server 

On the HTTP server 12 the data packet created in 
the data concatenation 11 is deposited and made 
available to be called up from a mass storage 
device . 



Intranet /Internet 



HTTP client (Internet browser) 

The browser 14 has the task of downloading the data 
packet from an HTTP server 12 by way of the 
intranet/Internet 13 and initiating the process of 
decoding by the SDC decoder 15. 

SDC decoder 

The SDC decoder 15 comprises substantially of the 
arithmetic decoder 17 and the stack machine 22. The 
result of its operation is the sequence of 
(terminal) symbols that correspond to the original 
Java source code 1 (cf. 3 and 4). 

Data extractor 

Here the binary data stream from the arithmetic 
encoder 10 and the symbol table compressed by the 
zip compressor 7 are extracted from the common data 
packet and sent on, the former to the arithmetic 
decoder 17 and the latter to the zip-decompressor 
18. 

Arithmetic decoder 

The arithmetic decoder 3 adaptively decodes the 
binary data stream produced by data extraction 16^ 
taking into account the probability distributions 
associated with the SDC symbols that have been made 
available by the receiver-side adapter 20^ and at 
each decoding step transmits an SDC symbol to the 
stack machine 22. 
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18 . Zip-decompressor 

The symbol table from the coder 2, compressed by 
the Lempel-Ziv method 7 and received from the data 
extractor 16, is unpacked and entered into the 
5 symbol-table memory of the decoder 19. 

19. Symbol-table memory of the decoder 

20. Adapter (receiver-side) 

10 The initialization of the receiver- side adapter 2 0 

is analogous to that of the transmitter-side 
adapter 8. While decoding proceeds, at each 
decoding step the adapter receives an SDC symbol, 
adapts the probability distribution of the SDC 

15 symbol of the current node type and enters the 

adapted probability distributions into the 
probability-distribution table 21. At the same 
time, the receiver-side adapter 20 supplies to the 
arithmetic decoder 17 the probability distribution 

20 that is valid for the next decoding step. The 

selection of this distribution is determined by the 
node type determined by the stack machine 22 and 
expected in the next decoding step. 

25 21. Table of probability distributions of the SDC 
symbols 

For each node type the table contains an individual 
probability distribution of the SDC symbols of this 
node type. 
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stack machine 

The initialization of the stack machine 22, the 
processing of the stack memory 23 and the output of 
terminal symbols is represented in Figure 4 as a 
flow diagram. In addition the stack machine 22 
transmits the following data to the adapter 20: 

(a) the current node type, for adaptation of the 
probability distributions of the SDC symbols of 
this node type; and 

(b) the next node type, the SDC- symbol probability 
distribution of which the adapter 20 must supply to 
the arithmetic decoder 17 in the next decoding 
step . 

Stack memory 

Stack memory 23 for the stack machine 22. 
Java sink 

Retrieved Java program text without comments, 
superfluous blank spaces etc. 

Code generator 

The code generator 25 corresponds to a compiler 
back end and generates byte code for the JVM 26 in 
dependence on the sequence of (terminal) symbols 
provided by the stack machine 22. 

JVM 

The virtual machine for Java programs (Java Virtual 
Machine) . 
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Key to reference characters 
Reference characters in Figure 4 

j8k String of terminal and non-terminal symbols 
a SDC symbol 

V Terminal or non-terminal symbol 
5 Start symbol (non-terminal symbol) 

1. The development of a valid sentence according to 
the grammar (a program) begins with the start 
symbol 5. The start symbol S is placed on the 
stack . 

2. The uppermost symbol V is read from the stack. 

3. Check whether V is a non-terminal symbol. 

4. From the linear input stream of SDC symbols the 
next SDC symbol a is read. 
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5. If V is a non-terminal symbol, the derivation is 
continued with V. By means of cr a selection {V.pj,) = 
z"^(<t) from the set of alternatives of V ^ Pi\ ... 

is determined for V. According to the particular 
5 selection made, V is replaced by the string j5y^, 

6. The string p}, of symbols (terminal and/or non- 
terminal symbols) is placed on the stack. 

7. If V is a terminal symbol, it is output. 

8. Test whether the stack is empty. 

10 9, The stack machine terminates when the stack is 
empty. 

Reference characters in Figures 5 and 6 

1. Java source 
15 2. SDC encoder 

3. Scanner 

4 . Parser 

5. Parse-tree processor 

6. Symbol-table memory 

20 7. Zip-compression device 

8. Transmitter-side adapter 

9. Table of probability distributions of the SDC 
symbols 

10. Arithmetic encoder 
25 11. Data concatenation 

12. HTTP server 

13 . Intranet /Internet 

14. HTTP client (Internet browser) 

15. SDC decoder 

30 16. Data extractor 

17. Arithmetic decoder 
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18 . Zip-decompressor 

19. Symbol-table memory of the decoder 

20. Adapter (receiver side) 

21. Table of probability distributions of the SDC 
5 symbols 

22. Stack machine 

23. Stack memory 

24 . Java sink 

25 . Code generator 
10 26. JVM 



Abstract of the Disclosure 



A method and apparatus for its implementation are 
proposed that make it possible to eliminate redundancy 
in digitally represented texts with syntactic structure - 
The data reduction is brought about here by parsing the 
message according to the rules of the underlying syntax 
and subsequent entropy encoding of the decisions made to 
select the rules. If the text is a computer program^ a 
stack machine can be used to generate from the 
compressed representation either the original text or a 
program that can be executed directly. Among the areas 
of application is the transmission of computer programs 
in networks with limited bandwidth. 
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COMPRESSION OF DATA WITH SYNTACTIC STRUCTURE 



1. Introduction to the General Topic 

The coding of source files has important technical 
applications in the area of transmission and storage of 
information. The desire to have a low data rate in the 
transmission of information, and a small space 
requirement in the storage of information, has given 
rise to the demand for a form of information coding that 
is as brief as possible, i.e. one with little or, 
ideally, no redundancy. 

2 . State of the Art 



2.1. Forms of Redundancy 



Redundancy in streams of source symbols takes two basic 
forms: it can appear as a differential frequency of 
occurrence of the elements in the set of source symbols, 
and in the form of statistical dependencies between 
temporally separated source symbols, which usually are 
caused by source- specif ic formation laws for the 
construction of a text as a sequence of source symbols. 

3 . Disadvantages of Known Methods 

For the compression of messages from sources that suffer 
from redundancy and •emit source symbols that are not 
uniformly distributed but are statistically independent, 
a procedure according to Huffman [1] is known. Being 
specially intended for this type of source, the 
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procedure enables inaximal compression only under the ^ 

extremely favorable condition that the statistical \ 
properties of the source are known or can be reliably 
estimated . 

[ 

Whereas estimation of the probabilities of occurrence of [ 

individual source symbols, which is necessary for j 

Huffman coding, can be done with sufficient accuracy f 

even in the case of moderately long texts, the J 

estimation of statistical dependencies between I 

temporally separate source symbols as a rule requires I 

extensive observation of the source, which goes far \ 
beyond what is feasible both technically (storage 
capacity) and in terms of observation time. For this 

reason the known or similar procedures always involve ; 

making assumptions about the underlying formation law or - 
crucial characteristics thereof, and then of course are 

suitable only for sources with this formation law. This [ 

qualification also applies to the Lempel-Ziv (LZ) [ 

procedure for data compression [2] , which has been known I 

since the end of the 1970' s. This procedure is suitable | 

for the compression of arbitrary linear symbol [ 

sequences; that is, it can be applied with no knowledge | 

of the statistical properties of the information source. I 

The main areas of application today are the compression ; 

of pictorial data (GIF file format) and arbitrary files I 

(archiving programs gnuzip, pkzip) . In contrast to I 

Huffman coding, this procedure utilizes statistical I 
dependencies between temporally successive symbols, by 
taking into account repeatedly occurring subsequences of 
symbols (strings) . Hence LZ compression is suitable only 
for sources, the formation law of which leads to the 
frequent occurrence of particular strings. The 
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redundancy introduced into a text by following \ 

grammatical rules during its construction is not [ 

accessible to the known compression procedures because r 

of the recursive structure of grammatical rules, since f 

these procedures merely employ symbol frequencies or I 
repetitive chains of alphanumeric characters for 
compression. 

4, Definition of the Object of the Invention 

I 

The object of the present invention is to provide a [ 

method and apparatus for implementing the method that [ 

enable a text consisting of a stream of alphanumeric [ 

characters, and is constructed according to grammatical ; 
rules, to be encoded in such a way as to remove the 
redundancy produced as a result of the limitation of all 

possible character sequences by the grammar. ] 

This object is achieved by a method with the 
characteristics given in Claim 1 and two kinds of 

apparatus by means of which the characteristics given in I 
Claims 8 and 9 are achieved. 

5. Advantages of the Solution in Accordance with the Invention [ 

j 

with the solution in accordance with the invention, j 

enabling data reduction in dependence on the \ 

programming- language syntax employed, it now becomes \ 

possible in an especially advantageous manner to remove, > 

asymptotically almost completely, redundancy produced in f 
texts during their construction because of the need to 

follow certain grammatical rules. ! 
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The invention has proved this by analytical calculation 

of the capacity of its data stream, encoded in a syntax- > 

directed manner, using the example of a language with : 

characteristics typical of programming languages \ 

(mathematical expressions, if-then-else construct, | 

etc.). In this process it was shown that an appreciable I 

amount of the redundancy that conventional methods [' 
(Lempel-Ziv, Huffman) are designed to address has not 
previously been eliminated by these methods . 

The new method of syntax-directed coding, in contrast, [ 

is capable of reducing the amount of data by an I 

additional ca . one -half in comparison, for instance, to I 

a result obtained with the Lempel-Ziv compression I 

procedure. I 

Thus the method in accordance with the invention, 
together with the apparatus specified for its 
implementation, is predestined for all applications that 
involve the transmission or storage of syntactically 
structured texts. Examples include the transmission of | 
Java applets in the Internet or an intranet, the \ 
transmission or storage of Postscript files, the 
transmission and storage of MPEG4 -encoded video data, I 
and the use of higher protocols in communication. | 
Especially for the transmission of programs, such as ^ 
Java applets, syntax-directed coding (SDC) is ! 
particularly suitable because the parser is a component I 
of both a compiler and an SDC coder, so that SDC can be 
organically integrated into the data flow from the 
writing to the running of the program. 



6 . Mode of operation of the invention 
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6.1 Syntactically structured source [ 

The invention takes as a point of departure a formal 
language represented by a set of chains of alphanumeric 
characters. The text is put together according to a 
grammar assigned to the language. A grammar is both the \ 
mathematical system used to define a formal language, \ 
and a set of rules to determine the syntactic validity \ 
of a specific sentence. Only context-free grammars are 
considered here, because higher programming languages ■ 
(C, C++, Java, etc.) are described almost exclusively by ; 
context-free grammars. The following definitions apply: \ 

Definition 1 A content- free grarnmar is a 4 -tuple G = f 
(N, T, P, S) , where 

N is the set of non~ terminal symbols, \ 
T is the set of terminal symbols^ and 

P is the relation N x (N \J T) * , ' 
S is a special non~ terminal symbol, also called 
start symbol . 

Definition 2 An element {A, p) g P is called production I 
and is abbreviated A ^ f3 . 

\ 

A production (or derivation) is thus a replacement rule | 
for a non-terminal symbol A, such that this non-terminal ' 
symbol is replaced by a string (chain) j3 composed of 

(non- terminal and) terminal symbols, I 

I 

When there exist several productions (A, J3i) , (A, P2) , ' 
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(A, fin) e P, for this we write: 

I A 

I J3n I 

Definition 3 The set Pao ^ P of all possible derivations | 

for a non-teinninal symbol Ao ^ 

Pao - {{A,fi) & P : A = Ao} ; 

is also called the set of the alternatives for Aq. The ] 

decision to use a specific production {Ao,j3k] e Pad, {k = \ 
1,2, n) is called selection. 

Example 1 All arithmetic expressions in the variables a 

with the operations + and * as well as arbitrarily \ 

nested brackets (,) are a formal language generated by ; 

the grammar ^ 

G - {{E,T,F] , {a,+,*, (, ) ] ,P,E) . I 

P consists of the following productions : ^ 

E ~^ E^T s 

I T ; 

T T*F ' 

I F 
F (E) 

I a = 
3*3 + (3*3+3) *3, for example, is a valid sentence . 
In generating a sentence, further strings of non- 
terminal and, where appropriate, terminal symbols are 
derived by application of suitable productions to the 



Scarpeiii - PCT-GB99-00213 - Translated text. doc 



Page 7 



- 7 - 

non- terminal symbols until ultimately the string 
consists only of terminal symbols . In the example given 
above, the valid sentence consists only of the terminal 
symbols a, , * , { , ) . 

6.2 Principle of Syntax-Directed Coding 

Information always flows into a text being generated 
when during its construction decisions are made. 
Decisions are to be made when a particular one of 
several possible source symbols is to be chosen or, in 
the case of a syntactically structured source, out of 
several applicable productions A ~> y^i, A f52, ... 

precisely one production actually is applied (selection, 
cf . Definition 3) . 

SDC encodes, instead of the string of terminal symbols, 
the selections in the order in which they were made 
during the construction of this string (the text) . 

{cJi/ o'2, • • w cjn} is called SDC 

n = max { I PaI } 
A 

All the elements in each set of alternative Pa are 
numbered (arbitrarily) with SDC symbols ai e S : 

Definition 5 : Pa ^ ^ is an (arbitrary) injective 

mapping- . 

The concrete definition of Za can in general be dependent 



Definition 4 S = 

alphabet^ wi th 
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on Pa. 

As an illustipation of the invention the following 
exemplary embodiments are described in detail and shown 
in Figures 1 to 6 , as follows: 

Figure 1 is a block diagram of an SDC transmission 
system 

Figure 2 shows alternatives of the grammar G and 
selections to which SDC symbols have been attributed 

Figure 3 is a parse tree of the sentence 8*3+ (a * 
a + a) * a with a resulting linear SDC symbol sequence 
1212221211212222222 

Figure 4 is a flow diagram of the stack machine 

Figure 5 is an example of an SDC transmission 
system with receiver-side source-text output 

Figure 6 is an example of an SDC transmission 
system with executable Java applet or application as 
output 

The functioning of a transmission system with SDC is 
illustrated in Figure 1. In principle transmission and 
storage are equivalent with respect to the way the 
coding works . 

The source delivers a syntactically structured stream of 
symbols (source program) , An elementary component of the 
SDC encoder is a parser, the task of which is to 
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retrieve from the initially linear sequence of source \ 

symbols the grammatical structure of the sentence (the \ 

source program) . This grammatical structure is j 

represented in general by a parse tree, which in turn | 

serves as input to the encoder. The encoder encodes the [ 

parse tree to generate a linear sequence of SDC symbols, | 

X 

such that at each moment the current SDC symbol is [ 

output and transmitted in dependence on the current \ 

I 

selection at that moment. The SDC decoder first \ 

reconstructs the parse tree from the linear sequence of \ 

SDC symbols and then, by traversing the parse tree, \ 

reproduces the sequence of terminal symbols sent out [ 

from the source . I 

[ 

\ 

6 . 3 Encoding = 

I 

The elementary task of the parser comprises the \ 

reconstruction of the selections made by the source [ 

during construction of the sentence. A parsing step thus | 

amounts to determining the specific production {A, /S) e p 

Pa that the source had selected for constructing the text \ 

at this point. As stated in Definition 5, the encoder \ 

assigns to the selection thus determined the associated ^ 

SDC symbol C7)c = ( (A/ p) ) and attributes to the j, 

corresponding node in the parse tree. \ 

By traversing the parse tree, ultimately the linear I 

sequence of SDC symbols is generated and output. In the \ 

examples given here, a depth- first algorithm is always i 

used for this purpose. The process of encoding will now f 

be explained in greater detail with reference to an \ 

exemplary parse tree and the underlying grammar from \ 
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Example 1 . 

Example 2 The context-free grammar of Example 1 is 
assumed. All alternatives, i.e. all possibilities for 
deriving each non~ terminal symbol, are shown graphically 
in Figure 2. Because the greatest set of alternatives 
comprises precisely two elements , the SDC alphabet S = 
{l, 2} suffices . To each selection of all quantities of 
alternatives an SDC symbol is attributed. For example, 
the selection of the specific production T ^ T * F is 
assigned the SDC symbol 1. 

A valid sentence in this grammar is thus, for example: 
a*a + {a*a+a) *a. The parse tree for this sentence 
and the generation of the SDC symbol sequence by depth- 
first traversing are shown in Figure 3. 

6.4 SDC with arithmetic encoding 

The method that has just been described is efficient 
only if 

1. I pj = const V A, and 

2. P( (Ai = const V A, pt. 

where P{{Ax (3^)\ a) is the conditional probability of 
making the selection that generates /?i from the set of 
alternatives Pa- 

This deficiency can be alleviated by arithmetic encoding 
[3] of the SDC alphabet Z as stated in Claims 6 and 7. 
The probability distributions of the SDC symbols used 
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for arithmetic encoding should be adapted in an adapter 
according to Claim 10 in which case the adaptation must 

be carried out separately for each set of alternatives . ^ 

Instead of the linear sequence of SDC symbols, the bit j 

stream generated by the arithmetic encoder is [ 

transmitted. t 

I 

\ 

6 . 5 Decoding 

Decoding is the inverse of the encoding process. The ; 

decoder formats the SDC code back into the original J 

sentence (strings of terminal symbols) . It operates \ 

according to the procedure described in Claims 1, 3 and | 
4, 5 and is implemented by a stack machine [4] according 
to Claims 8 and 9 . 



A flow diagram for a method according to Claim 22 is i 

given in Figure 4 . \ 

To explain operation of the stack machine the following | 

abbreviations are employed: i 

y3fc String of terminal and non- terminal symbols [ 

cr SDC symbol \ 

V Terminal or non- terminal symbol [ 

S Start symbol {non- terminal symbol) i 

The decoding thus proceeds in the following steps {the I 
numbers correspond to the reference numerals shown in 
Figure 4 ) : 

1. The development of a valid sentence according to 
the grammar {a program) begins with the start 
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symbol S. The start symbol S is placed on the 
stack . 

2. The uppermost symbol V is read from the stack. 

3. Check whether is a non-terminal symbol. 

4 . From the linear input stream of SDC symbols the 
next SDC symbol a is read. 

5. In case V is a non- terminal symbol, the derivation 
with V is further developed. By way of a a 
selection {V,(3y:) - z~^{cr) is made for V from the set 
of alternatives ot V ^ - • - lySn- In accordance 
with the specific selection, V is then replaced by 
the string 

6. The string py, of symbols (terminal and/or non- 
terminal symbols) is placed on the stack. 

7. If V is a terminal symbol, it is output. 

8. Check whether the stack is empty. 

9. The stack machine terminates when the stack is 
empty . 

The strings of terminal symbols output successively, 
when linked together, finally yield the sentence 
originally encoded from the source. 

7. Exemplary Embodiment: Preferred Form of Application 

The two exemplary embodiments shown in Figures 5 and 6 
differ merely in the form in which data are sent to the 
receiver (output of source text corresponding to the 
method according to Claim 3 versus direct output of an 
executable Java applet or an executable Java 
application, corresponding to the method according to 
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Claim 4) . 

1 . Java source 

A Java source corresponds to an ASCII text based on 
the Java grammar, which consists of Java program 
text plus comments, blank spaces etc. 

^; ; 2 . SDC encoder 

V- The SDC encoder is composed of a compiler front end 

((3), (4) and(6)),a parse-tree processor (5), an 

i.::^ adapter (8) and modules for compression and 

concatenation of the generated data. 

3 . Scanner 

.jj; The scanner reads the source text, removes the text 

sites that are unimportant for the program such as 
documentation, comments, blank spaces etc. and 

r~ sends (terminal) symbols to the parser (4) . 

4 . Parser 

The parser interprets the (terminal) symbols from 
the scanner (3), tests whether the symbol sequences 
agree with the grammatical rules for Java and sets 
up the specific parse tree for this program. 

5. Parse-tree processing 

During the initialization of parse-tree processing 
all nodes of the parse tree are assigned SDC 
symbols specified for each type of node. The 
subsequent traversing of all nodes of the parse 
tree causes an SDC symbol to be output at each 
crossing of a node. These SDC symbols are sent to 
both the adapter (8) and the arithmetic coder (10) . 
In addition, the parse-tree processor transmits the 

i 

I 
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current node type to the adapter (8) . ; 

6. Symbol-table memory 

The symbol-table memory contains the identifiers of ^ 
the Java program text extracted by the parser (4) . 

7. Zip-compression device ; 
The content of the symbol table (6) is compressed ; 
by a method according to Lempel -Ziv-Welch [2] . ; 

8. Adapter (transmitter-side) 

During the process of initialization the adapter i 

enters initial probability distributions for each ' 

node type into the probability-distribution table j 

(9) . During processing of the SDC symbols received ! 

from the parse- tree processor (5) ^ and with \ 

reference to the current node type, at each step i 

the probability distribution of all SDC symbols of I 

the current node type is adapted and entered into \ 

the probability-distribution table (9) . The adapter ' 

I 

8 simultaneously makes available to the arithmetic i 

i 

encoder (10) the probability distribution ; 

associated with the SDC symbol currently being J 

processed- i 

I 

9. Table of the probability distributions of the SDC f 
symbols \ 
For each node type, the table contains a i 
probability distribution of SDC symbols that is 
peculiar to that node type. 

10. Arithmetic encoder 

The arithmetic encoder (3) receives from the parse- 
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tree processor (5) the next SDC symbol to be 

encoded and encodes it with reference to the 

probability distribution made available by the 

adapter (8) . The output of the arithmetic encoder 
is a binary data stream. 

11. Data concatenation 

The binary data stream from the arithmetic encoder 
(10) and the symbol table (6) compressed by the 
compressor (7) are combined to form a data packet. 

12 . HTTP server 

On the HTTP server the data packet created in the 
data concatenation (11) is deposited and made 
available to be called up from a mass storage 
device . 

13 - Intranet/Internet 



14. HTTP client (Internet browser) \ 

The browser has the task of downloading the data \ 

packet from an HTTP server (12) by way of the [ 

. i' 

intranet/Internet (13) and initiating the process j 

of decoding by the SDC decoder (15) . [ 

15. SDC decoder I 
The SDC decoder comprises substantially of the ; 
arithmetic decoder (17) and the stack machine (22) . \ 
The result of its operation is the sequence of ' 
(terminal) symbols that correspond to the original 

Java source code (1) (cf . 3 and 4) . 
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Data extractor 
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Here the binary data stream from the arithmetic 
encoder (10) and the symbol table compressed by the 
zip compressor (7) are extracted from the common 
data packet and sent on, the former to the 
arithmetic decoder (17) and the latter to the zip- 
decompressor (18) . 

17 . Arithmetic decoder 

The arithmetic decoder [3] adapt ively decodes the 
binary data stream produced by data extraction 
(16) , taking into account the probability 
distributions associated with the SDC symbols that 
have been made available by the receiver- side 
adapter (20) , and at each decoding step transmits 
an SDC symbol to the stack machine (22) . 

18 . Zip-decompressor 

The symbol table from the coder [6] , compressed by 
the Lempel-Ziv method (7) and received from the 
data extractor (16) , is unpacked and entered into 
the symbol -table memory of the decoder (19) . 

19- Symbol -table memory of the decoder 

20. Adapter (receiver- side) 

The initialization of the receiver- side adapter 20 
is analogous to that of the transmitter- side 
adapter (8) . While decoding proceeds, at each 
decoding step the adapter receives an SDC symbol, 
adapts the probability distribution of the SDC 
symbol of the current node type and enters the 
adapted probability distributions into the 
probability-distribution table (21) . At the same 
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time, the receiver- side adapter supplies to the ^ 
arithmetic decoder the probability distribution 

that is valid for the next decoding step. The ! 

selection of this distribution is determined by the I 

node type determined by the stack machine (22) and \ 

expected in the next decoding step. \ 

f 

21. Table of probability distributions of the SDC 
symbols 

For each node type the table contains an individual 
probability distribution of the SDC symbols of this 
node type . 

22 . Stack machine 

The initialization of the stack machine, the \ 

processing of the stack memory and the output of ; 

terminal symbols is represented in Figure 4 as a | 

flow diagram. In addition the stack machine \ 

transmits the following data to the adapter(20): i 

(a) the current node type, for adaptation of the I 
probability distributions of the SDC symbols of ^ 
thi s node type \ 

f 

(b) the next node type, the SDC-symbol probability S 
distribution of which the adapter (20) must supply [ 
to the arithmetic decoder (17) in the next decoding j 
step. f 

23 . Stack memory 

Stack memory for the stack machine. 
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Java sink 
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Retrieved Java program text without comments, 
superfluous blank spaces etc. 

25. Code generator 

The code generator corresponds to a compiler back 
end and generates byte code for the JVM (26) in 
dependence on the sequence of (terminal) symbols 
provided by the stack machine (22) . 

26. JVM 

The virtual machine for Java programs (Java Virtual 
Machine) . 
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Reference characters in Figure 4 

String of terminal and non- terminal symbols 
a SDC symbol 

V Terminal or non- terminal symbol 
S Start symbol (non- terminal symbol) 

1. The development of a valid sentence according to 
the grammar (a program) begins with the start 
symbol S. The start symbol S is placed on the 
stack . 

2. The uppermost symbol V is read from the stack. 

3. Check whether V is a non-terminal symbol. 

4. From the linear input stream of SDC symbols the 
next SDC symbol a is read. 

5. If V is a non- terminal symbol, the derivation is 
continued with V. By means of cr a selection (V, yftc) = 
z-Mcr) from the set of alternatives of -> AI 

\ pn is determined for V, According to the particular 
selection made, V is replaced by the string yft,. 

6. The string of symbols {terminal and/or non- 
terminal symbols) is placed on the stack. 

7. If y is a terminal symbol, it is output. 

8. Test whether the stack is empty. 

9. The stack machine terminates when the stack is 
empty . 

Reference characters in Figures 5 and 6 

1 . Java source 

2 . SDC encoder 

3 . Scanner 

4 . Parser 
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5 . Parse-tree processor 

6. Symbol- table memory 

7 . Zip -compress ion device 

8. Transmitter- side adapter 

9. Table of probability distributions 
symbols 

10. Arithmetic encoder 

11. Data concatenation 

12 . HTTP server 

13 . Intranet/Internet 

14. HTTP client (Internet browser) 

15 . SDC decoder 

16 . Data extractor 

17. Arithmetic decoder 

18. Zip-decompressor 

19. Symbol-table memory of the decoder 

20. Adapter (receiver side) 

21. Table of probability distributions 
symbols 

22 . Stack machine 

2 3 . Stack memory 

24 . Java sink 

25. Code generator 
2 6 . JVM 
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Claims 



Method for the compressed transmission and/or 
storage of a text represented by digital data, the 
structure of which is defined by a grammar, 
characterized in that 

a) prior to storage and/or transmission the text 
is converted into a linear sequence of symbols 
that specify the successive application of 
grammatical rules to form the text, and 

(b) to decode the text after the storage and/or 
transmission, for every symbol thus produced 
by syntax-directed coding (SDC) the 
grammatical rule corresponding to that 
particular SDC symbol is performed and 
particular output data are generated by this 
rule . 



Method according to Claim 1, | 
characterized in that to generate the SDC symbols [ 

[ 
i 

(a) by parsing the text a parse tree corresponding [ 
to the consecutive applications of the [ 
grammatical rules is generated, i 



(b) to each node in this parse tree there is 
attributed an SDC symbol, which out of all the 
rules permitted by the grammar at this site 
unambiguously identifies the particular rule 
that is actually used to generate the original 
text, and 
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(c) the SDC symbols are concatenated, according to \ 

a fixed order of traversing all nodes of the ; 

parse tree, to form a linear sequence of SDC j 

symbols. f 

I 

3. Method according to Claim 1, | 
characterized in that for decoding [ 

[ 

(a) the uppermost entry in a stack memory is | 
replaced according to the grammar production [ 
determined by the SDC symbol that has been ; 
input , 1 

\ 

I 

(b) parts of the grammatical rules that cannot yet 

be completely processed are deposited in a J 
stack memory, and S 

! 

(c) parts of the production for which substitution j 
is complete are output as part of the original j 
text . f 

\ 

i 

i 

4. Method according to Claim 1, : 

characterized in that for decoding ■ 

I 

(a) the uppermost entry in a stack memory is i 
replaced according to the grammar production 
determined by the SDC symbol that has been ; 
input , 

(b) parts of the grammatical rules that cannot yet 
be completely processed are deposited in a 
stack memory, and 
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(c) parts of the production for which substitution 
is complete are processed immediately to form 
part of an executable program for a real 
processor or a virtual machine. 

5. Method according to Claim 3 or 4, 

characterized in that a stack machine 

(a) is initialized by depositing a specific start 
symbol (non- terminal symbol) into the empty 
stack memory, 

(b) reads the uppermost symbol from the stack 
memory, 

(c) checks whether the read symbol is a terminal 
or a non- terminal symbol, 

(d) if it is a terminal symbol, outputs the symbol 
and, depending on whether additional symbols 
are present in the stack memory, either 
continues with 5b or terminates if the stack 
memory is empty, or 

(e) if it is a non- terminal symbol, reads the next 
SDC symbol from the input stream, 

(f) depending on what SDC symbol has been read, 
selects precisely one alternative (chain of 
terminal and/or non- terminal symbols) out of 
the set of alternatively applicable 
replacement rules (productions) that are valid 
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for the non-terminal symbol currently being 
processed, and 

(g) places this chain of terminal and/or non- 
terminal symbols into the stack memory and 
continues with 5b- 

6 . Method according to Claim 2 , 
characterized in that 

(a) to each node in the parse tree there is 
attributed an SDC symbol and the probability 
distribution of all the SDC symbols possible 
in this node, 

(b) the linear sequence of SDC symbols is 
subjected to entropy encoding in conformity 
with the associated probability distributions, 

(c) the entropy decoding is carried out with a 
probability distribution of SDC symbols 
identical to that used for the entropy 
encoding . 

7- Method according to Claim 6, 
characterized in that 

(a) the probability distribution of the rules that 
can be applied in a node, starting from an 
initial distribution, is adapted at each 
appearance of an SDC symbol in such a way that 
the probability of the SDC symbol that appears 
is increased and the probability of all other 
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symbols is correspondingly reduced, 

(b) the currently valid distribution of occurrence 
probabilities is assigned to the SDC symbols 
of the associated node type, 

(c) the probability distribution of all SDC 
symbols in a current node, together with the 
SDC symbol to be encoded, forms a model for an 
arithmetic encoding, 

(d) during decoding the end of the text is 
recognized by the fact that the stack memory 
is empty, and 

(e) an End-Of -Message (EOM) symbol required for 
arithmetic coding is eliminated. 

8. Apparatus for the method according to Claim 1, 
characterized by an encoder comprising 

(a) a scanner to transform text comprising a 
sequence of readable characters into a 
sequence of terminal symbols, 

(b) a parser to find grammatical rules, the 
successive application of which was originally 
used to generate the sequence of terminal 
symbols , 

(c) a mapper, which unambiguously associates 
syntax-directed symbols with the rules 
identified by the parser and outputs these 
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symbols in a fixed sequence, 
and a decoder comprising ! 

(a) a stack machine which, according to the ; 
uppermost symbol in the stack memory and, 

where appropriate, the adjacent SDC symbol, 
outputs the already fixed terminal symbol, or 
deposits the sequence of terminal and/or non- : 
terminal symbols associated with the current [ 
symbol into the stack memory, and [ 

t 

(b) a lexicon that replaces the terminal symbols I 

\ 

by chains of readable alphanumeric characters . [ 

9. Apparatus for the method according to Claim 3, ; 
characterized by an encoder comprising | 

i 

I 

a) a scanner to transform a program that is \ 
present in a source text or in a form derived [ 
from the source text by a preprocessor, into a > 
sequence of terminal symbols, f 

i 

(b) a parser to find the grammatical rules, the | 
successive application of which was originally | 
used to generate the sequence of terminal \ 
symbols, and j 

(c) a mapper, which unambiguously associates * 
syntax-directed symbols with the rules 
identified by the parser and outputs these 

symbols in a fixed sequence, ' 
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and a decoder comprising 

(a) a stack machine which, according to the 
uppermost symbol in the stack memory and, 
where appropriate, the adjacent SDC symbol, 
outputs the already fixed terminal symbol, or 
deposits the sequence of terminal and/or non- 
terminal symbols associated with the current 
symbol into the stack memory, and 

(b) a code generator that generates from the 
sequence of terminal symbols executable 
machine code, or intermediate code to be 
executed on a virtual machine . 

10. Apparatus for the method according to Claim 7, 
comprising 

(a) on a transmitter side 

i. a table that contains the probability 
distributions of the SDC symbols for each 
node type, the contents of which are 
established at initialization with fixed 
initial probability distributions for 
each node type, 

ii. an adapter that updates the probability 
distribution of the SDC symbols for the 
node type valid at any given moment with 
reference to the existing probability 
distribution, the SDC symbol to be 
encoded and the current node type, and 
enters this new probability distribution 
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into the table; and 

iii.an arithmetic coder that encodes each SDC 
symbol to be encoded with the currently 
valid probability distribution supplied 
by the adapter; 

(b) and on a receiver side 

i. a table that contains the probability 
distributions of the SDC symbols for each 
node type, the contents of which are 
established at initialization with fixed 
initial probability distributions for 
each node type, 

ii. an adapter that updates the probability 
distribution of the SDC symbols for the 
node type valid at any given moment, as 
established by the stack machine, with 
reference to the existing probability 
distribution, the SDC symbol to be 
encoded and the current node type, and 
enters this new probability distribution 
into the table, 

iii. an arithmetic decoder which, with 
reference to the currently valid 
probability distribution of the current 
node type supplied by the adapter, 
decodes the next SDC symbol and sends it 
to the stack machine for further 
processing . 
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Abstract 

A method and apparatus for its implementation are 
proposed that make it possible to eliminate redundancy 
in digitally represented texts with syntactic structure. 

The data reduction is brought about here by parsing the 
message according to the rules of the underlying syntax 
and subsequent entropy encoding of the decisions made to 
select the rules. If the text is a computer program, a 
stack machine can be used to generate from the 
compressed representation either the original text or a 
program that can be executed directly. 

Among the areas of application is the transmission of 
computer programs in networks with limited bandwidth. 
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' POWER OF ATTORNEY: I hereby appoint as my auomeys, with tull powers of subsmution duu ' ,j 
' prosecute^ this appiicacion and transact all business in che Paieiit and Trademark Office connecied therewith: r oni : 0823 



Mvtn D, Shuiman (19.4121 
Alien H Gerstein 
iVaie F. Scanxrll i (22.320 ) 
Edwifd O'Tooic (Z2,^rj) 
Michael F. Sorun QpMT)^ 
Trevor B. Joike r25.542) ^ 
Car! E. Moore, Jr. <26,4S'n 



Richard H. ATKlersoa C26,52t^) 
Pacrick D. Ertef (2^^7) 
/ames P, 2e!Ier ^"S.491) 
William £. McCraciten (3ai<?5) 
Richard A Schnurr (3Q,8$g) 
Amliony Nimmo (30^^) 
Christine A. Dutiiik (31,245) 




Jeffreys, Sharp ( 3 1. $79) 
Manin J. Hirsch (32.237) 
James J Napoli (IQg l) 
Richard M. U Barge (32.254 ) 
ti-Hsien Rin-Uures. M.D^MUiP) 
DuugJassC HochsietlcTjSXllQ) 
Robtrr M. Gcrsietn j34j^J^ 



David W, Clough 1(36,107) 
Richard A. Brandor^lT^) 
James A. Bight X2X422) 
Roger A. Heppermann C^;^^^ 
David A. Gass C2fiJ^ 
Gregory C. Mayer^JBS) 



Send correspondence to: I Naie Scarpelli 



FIRM NAME PHONE NO. STREET CITY & STATE 2IP CODE 

Ma rshall. 0 ' Toole. Gcrsteia, 'td3<)Q.&:m Tower 

">Iuf ray <St Bonin 312-474-6300 V _ 2.13 So ud i Wacker Dri ve Chicago, Illfoois 60606-6402 



1 


Full Name of First or 5ole Inventor 


Citizenship 
German 




Kesidcnce Address - Street 
Wangenerstrafle 75 


Posi Office Address - Smret 
WangenerstraBe 75 




City (Zip) 

\Scarnberg.823l9 


City (Zip) 

Starnberg 82319 




Smc or Couniry 

Federal Repibiic of Germany 




Siaw or Country 

Federal Republic of Germany 








Signature 

0^ (^Z^ ^ 



Second Joint Inventor, if any 


Cicizcnship 
German 


Residence Address - Street 

Josef-Frankl-Strasse 9a 


Pose Office Address - Sircet 

Josef-Frankl-Srrassc 9a 


Ovi (Zip) 

80995 M\mich XJft^ 


Ciry (Zip) 
80995 Munich 


State or Coanrry 

Federal Republic of GenTiany 


Siaic or Couniiy 

Federal Republic of Ged 
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Third Joint In vcnccr, if any 
_£iIANGSONG XIE. 


Cinzenship \ 
Chinese 


Residence Address - Sweet 
Maximilian-KoIbe-AIlee 15 


Posi Orficc Address - Street 
Maximilian-Koibe-Ailee 15 


City (Zip) 


Criy (Zip) 
81739 Munich 


Sraic or Country '"^ 
Federal Republic of Germany 


Scare or Country 

Federal Republic of Germany 




Stgnanijre i 





Fourth Joint Inventor- if any 


Ciiizenship 


Residence Address - Street 


Post OfTsce Address - Street 


City (Zip) 


City (Zip) 


State or Country 


Siate or Country 


Daic 


Signature 
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57 CFR 1.56. DUTY OF DISCLOSURE - INFORMATION MATERIAL TO PATENTAMMJ (Applicq 

(a) A patent by its very nature is effected with a public tnteresc. The public in^fSt i$ best served, and the most 
effective patent examination occurs when, at the nme an application is being examined, the Office is aware of and evaluates the 
teachings of all information maienai to patentability. Each individual associated with the filing and prosecution of a paieni 
application has a duty of candor and good faith in dealing with the Office, ^hich includes a duty to disclose to the Office all 
mformation known to that individual to be inaieriaJ to patentability as defined in this section. The duty to disclose mformaiion 
exists with respect lo each pending claim until the claim is canceled or withdrawn from consideration, or the application becomes 
abandoned. Information material to the patentability of a claim that is canceled or withdrawn from consideration need not be 
submitted if the information is not matenal to the pateniabiiity of any claim remainit^g under consideration in the application. 
There is no duty to submit information which is not material to the patentability of any existing claim. The duty to disclose all 
information known to be material to patentability is deemed to be satisfied if ail information known to be material to patentability 
of any claim issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by §§ i.97(b)-(d) and 
1.9$. However, no patent will be granted on an applicadon in conneaion with which fraud on the Office was practiced or 
attempted or the duty of disclosure was violated through bad faith or intentional misconduct. The Office encourages applicants 
CO carefully examine: 

(1) prior art cited in search reports of a foreign patent office in a counterpart application, and 

(2) Che closest information over which individuals associated with tiie filing or prosecution of a patent 
application believe any pending claim patentability defines, to make sure that any material information 
contained therein is disclosed to the Office, 



Information relating to the following factual sinjations enumerated in 35 USC 102 and 103 may be considered material under 37 
CFR i -56(a), 

35 US, C. 102, CONDITIONS FOR PATENTABILITY: NOVELTY AND LOSS OF RIGHT TO PATENT 
A person shall be entitled to a patent unless - 

(a) the invention was icnown or used by others m this couatry, or patented or described in a printed publication 
in this or a foreign country, before the invention thereof by the applicant for patent, or 

(b) the invention was pateniea or described in a primed publication in this or a foreign country or in public use 
Of on sale in this country, more than one year prior lo the date of the application for patent in the United States, or 

(c) he has abandoned the invention, or 

(d) the invention was Tim patented or caused to be patented, or was the subject of an inventor's certificate, by 
the applicant or his iegai representatives or assigns in a foreign country pnor to the date of the applicauon for patent in this country 
on an application for patenc or inventor's certificate filed more than twelve months before the filing of the application m the United 
States, or 

(e) the invention was described in a patent granted on an application for patent by another filed in the United 
States before the invention thereof by the applicant for patent, or on an international application by another who has fulfilled the 
requirements of paragraph (1), (2), and (4) of section 37i(c) of this title before the invention thereof by the applicant for parent, 
or 

(f) he did not himself invent the subject matter sought to be patented, or 

(g) before the applicant's invention thereof the invention was made in this country by another who had not 
abandoned, suppressed, or concealed it. In decenninmg priority of invention there shall be considered not oniy the respecdve dates 
of conception anti reduaion to practice of the invention, but also the reasonable diligence of one who was first to conceive and 
last to reduce co practice, from a time prior to conception by the other. 

35 USX. 103, CONDITIONS FOR PATENTABILITY: NON^-OBVIOUS SUBJECT MATTER (Applicable Ponton) 

A patent may not be obtained though the invention is not identically disclosed or described as set fonh in seaion 
102 of this title, if the differences between the subject matter sought to be patented and the prior an are such that the subject matter 
as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said 
subject maner pertains. Patentability shall not be negatived by ihe manner m which the invention was made. 

Subject matter developed by another person, which qualifies as prior art only under subsection (f) or (g) of 
section 102 of this tide, shall not preclude patentability under ihi.*i section where the subject matter and the claimed invention were, 
at the dme the invention was made, owned by the same person or subject to an obligauon of assignment to the same person. 

3SUS.C 112. SPECIFICATION (AppikabU Ponion) 

The specification shall comain a written description of the invention, and of the manner and process of making 
and using it, in .such full, clear, concise, and exact terms as to enable any person skilled in the art to which it penains, or with 
which it is most nearly connected, to make and use the same, and shall set forth die best mode contemplated by the inventor of 
carrying out his invention. 



